14 research outputs found

    Efficient-VRNet: An Exquisite Fusion Network for Riverway Panoptic Perception based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar

    Full text link
    Panoptic perception is essential to unmanned surface vehicles (USVs) for autonomous navigation. The current panoptic perception scheme is mainly based on vision only, that is, object detection and semantic segmentation are performed simultaneously based on camera sensors. Nevertheless, the fusion of camera and radar sensors is regarded as a promising method which could substitute pure vision methods, but almost all works focus on object detection only. Therefore, how to maximize and subtly fuse the features of vision and radar to improve both detection and segmentation is a challenge. In this paper, we focus on riverway panoptic perception based on USVs, which is a considerably unexplored field compared with road panoptic perception. We propose Efficient-VRNet, a model based on Contextual Clustering (CoC) and the asymmetric fusion of vision and 4D mmWave radar, which treats both vision and radar modalities fairly. Efficient-VRNet can simultaneously perform detection and segmentation of riverway objects and drivable area segmentation. Furthermore, we adopt an uncertainty-based panoptic perception training strategy to train Efficient-VRNet. In the experiments, our Efficient-VRNet achieves better performances on our collected dataset than other uni-modal models, especially in adverse weather and environment with poor lighting conditions. Our code and models are available at \url{https://github.com/GuanRunwei/Efficient-VRNet}

    Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar

    Full text link
    Current perception models for different tasks usually exist in modular forms on Unmanned Surface Vehicles (USVs), which infer extremely slowly in parallel on edge devices, causing the asynchrony between perception results and USV position, and leading to error decisions of autonomous navigation. Compared with Unmanned Ground Vehicles (UGVs), the robust perception of USVs develops relatively slowly. Moreover, most current multi-task perception models are huge in parameters, slow in inference and not scalable. Oriented on this, we propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar. Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation. Besides, models in Achelous family, with less than around 5 million parameters, achieve about 18 FPS on an NVIDIA Jetson AGX Xavier, 11 FPS faster than HybridNets, and exceed YOLOX-Tiny and Segformer-B0 on our collected dataset about 5 mAP50-95_{\text{50-95}} and 0.7 mIoU, especially under situations of adverse weather, dark environments and camera failure. To our knowledge, Achelous is the first comprehensive panoptic perception framework combining vision-level and point-cloud-level tasks for water-surface perception. To promote the development of the intelligent transportation community, we release our codes in \url{https://github.com/GuanRunwei/Achelous}.Comment: Accepted by ITSC 202

    FindVehicle and VehicleFinder: A NER dataset for natural language-based vehicle retrieval and a keyword-based cross-modal vehicle retrieval system

    Full text link
    Natural language (NL) based vehicle retrieval is a task aiming to retrieve a vehicle that is most consistent with a given NL query from among all candidate vehicles. Because NL query can be easily obtained, such a task has a promising prospect in building an interactive intelligent traffic system (ITS). Current solutions mainly focus on extracting both text and image features and mapping them to the same latent space to compare the similarity. However, existing methods usually use dependency analysis or semantic role-labelling techniques to find keywords related to vehicle attributes. These techniques may require a lot of pre-processing and post-processing work, and also suffer from extracting the wrong keyword when the NL query is complex. To tackle these problems and simplify, we borrow the idea from named entity recognition (NER) and construct FindVehicle, a NER dataset in the traffic domain. It has 42.3k labelled NL descriptions of vehicle tracks, containing information such as the location, orientation, type and colour of the vehicle. FindVehicle also adopts both overlapping entities and fine-grained entities to meet further requirements. To verify its effectiveness, we propose a baseline NL-based vehicle retrieval model called VehicleFinder. Our experiment shows that by using text encoders pre-trained by FindVehicle, VehicleFinder achieves 87.7\% precision and 89.4\% recall when retrieving a target vehicle by text command on our homemade dataset based on UA-DETRAC. The time cost of VehicleFinder is 279.35 ms on one ARM v8.2 CPU and 93.72 ms on one RTX A4000 GPU, which is much faster than the Transformer-based system. The dataset is open-source via the link https://github.com/GuanRunwei/FindVehicle, and the implementation can be found via the link https://github.com/GuanRunwei/VehicleFinder-CTIM

    WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmark for Autonomous Driving on Water Surfaces

    Full text link
    Autonomous driving on water surfaces plays an essential role in executing hazardous and time-consuming missions, such as maritime surveillance, survivors rescue, environmental monitoring, hydrography mapping and waste cleaning. This work presents WaterScenes, the first multi-task 4D radar-camera fusion dataset for autonomous driving on water surfaces. Equipped with a 4D radar and a monocular camera, our Unmanned Surface Vehicle (USV) proffers all-weather solutions for discerning object-related information, including color, shape, texture, range, velocity, azimuth, and elevation. Focusing on typical static and dynamic objects on water surfaces, we label the camera images and radar point clouds at pixel-level and point-level, respectively. In addition to basic perception tasks, such as object detection, instance segmentation and semantic segmentation, we also provide annotations for free-space segmentation and waterline segmentation. Leveraging the multi-task and multi-modal data, we conduct numerous experiments on the single modality of radar and camera, as well as the fused modalities. Results demonstrate that 4D radar-camera fusion can considerably enhance the robustness of perception on water surfaces, especially in adverse lighting and weather conditions. WaterScenes dataset is public on https://waterscenes.github.io

    A large scale Digital Elevation Model super-resolution Transformer

    No full text
    The Digital Elevation Model (DEM) super-resolution approach aims to improve the spatial resolution or detail of an existing DEM by applying techniques such as machine learning or spatial interpolation. Convolutional Neural Networks and Generative Adversarial Networks have exhibited remarkable capabilities in generating high-resolution DEMs from corresponding low-resolution inputs, significantly outperforming conventional spatial interpolation methods. Nevertheless, these current methodologies encounter substantial challenges when tasked with processing exceedingly high-resolution DEMs (256Ă—256,512Ă—512, or higher), specifically pertaining to the accurate restore maximum and minimum elevation values, the terrain features, and the edges of DEMs. Aiming to solve the problems of current super-resolution techniques that struggle to effectively restore topographic details and produce high-resolution DEMs that preserve coordinate information, this paper proposes an improved DEM super-resolution Transformer(DSRT) network for large-scale DEM super-resolution and account for geographic information continuity. We design a window attention module that is used to engage more elevation points in low-resolution DEMs, which can learn more terrain features from the input high-resolution DEMs. A GeoTransform module is designed to generate coordinates and projections for the DSRT network. We conduct an evaluation of the network utilizing DEMs of various types of terrains and elevation differences at resolutions of 64Ă—64,256Ă—256 and 512 Ă— 512. The network demonstrated leading performance across all assessments in terms of root mean square error (RMSE) for elevation, slope, aspect, and curvature, indicating that Transformer-based deep learning networks are superior to CNNs and GANs in learning DEM features

    RecepNet: Network with Large Receptive Field for Real-Time Semantic Segmentation and Application for Blue-Green Algae

    No full text
    Most high-performance semantic segmentation networks are based on complicated deep convolutional neural networks, leading to severe latency in real-time detection. However, the state-of-the-art semantic segmentation networks with low complexity are still far from detecting objects accurately. In this paper, we propose a real-time semantic segmentation network, RecepNet, which balances accuracy and inference speed well. Our network adopts a bilateral architecture (including a detail path, a semantic path and a bilateral aggregation module). We devise a lightweight baseline network for the semantic path to gather rich semantic and spatial information. We also propose a detail stage pattern to store optimized high-resolution information after removing redundancy. Meanwhile, the effective feature-extraction structures are designed to reduce computational complexity. RecepNet achieves an accuracy of 78.65% mIoU (mean intersection over union) on the Cityscapes dataset in the multi-scale crop and flip evaluation. Its algorithm complexity is 52.12 GMACs (giga multiply–accumulate operations) and its inference speed on an RTX 3090 GPU is 50.12 fps. Moreover, we successfully applied RecepNet for blue-green algae real-time detection. We made and published a dataset consisting of aerial images of water surface with blue-green algae, on which RecepNet achieved 82.12% mIoU. To the best of our knowledge, our dataset is the world’s first public dataset of blue-green algae for semantic segmentation.</jats:p

    Functional <i>IL-23R</i> rs10889677 Genetic Polymorphism and Risk of Multiple Solid Tumors: a Meta-Analysis

    Get PDF
    <div><p>Interleukin-23 receptor (IL23R) can interact with IL-23 and, thus, is involved in the T-helper 17 (Th17) cell-mediated inflammatory process as well as tumorigenesis. Recently, a functional single nucleotide polymorphism (SNP) rs10889677 has been identified in the 3’-untranslated region of <i>IL-23R</i>. It has been showed that the rs10889677AC SNP could increase the binding affinity of microRNA let-7f and downregulate IL-23R expression. Several case-control studies have examined the association between this SNP and genetic susceptibility of multiple solid tumors. However, the conclusions are conflicting. Therefore, we conducted this meta-analysis to systematically study the role of this functional <i>IL-23R</i> SNP in development of multiple solid tumors. There are a total of 5 studies are eligible (6731 cases and 7296 healthy controls). Either fixed-effect model or random-effect model was used to calculate pooled odds ratios (ORs) and the 95% confidence interval (95% CI). Significant association between this functional rs10889677 genetic variant and risk of multiple solid tumors were observed (CC genotype vs. AA genotype: OR = 0.59, 95% CI = 0.53-0.66, <i>P</i> < 0.001). These findings demonstrated that the <i>IL-23R</i> rs10889677 genetic variant might play an important part during malignant transformation of multiple solid tumors. </p> </div
    corecore